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Abstract — We consider communication over mem- 
oryless channels using low-density parity-check code 
ensembles above the iterative (belief propagation) 
threshold. What is the computational complexity of 
decoding (i.e., of reconstructing all the typical in- 
put codewords for a given channel output) in this 
regime? We define an algorithm accomplishing this 
task and analyze its typical performance. The behav- 
ior of the new algorithm can be expressed in purely 
information-theoretical terms. Its analysis provides 
an alternative proof of the area theorem for the bi- 
nary erasure channel. Finally, we explain how the 
area theorem is generalized to arbitrary memoryless 
channels. We note that the recently discovered rela- 
tion between mutual information and minimal square 
error is an instance of the area theorem in the setting 
of Gaussian channels. 

I. Introduction 

The analysis of iterative coding systems has been extremely 
effective in determining the conditions for successful commu- 
nication. The single most important prediction in this context 
is the existence of a threshold noise level below which the bit 
error rate vanishes (as the blocklength and the number of it- 
erations diverge). The threshold can be computed for a large 
variety of code ensembles using density evolution. 

On the other hand, understanding the behavior of these 
systems above threshold is largely an open issue. Since in this 
regime the bit error rate remains bounded away from zero, one 
may wonder about the motivation for such an investigation. 
We can think of three possible answers: (i) It is intellectu- 
ally frustrating to have an "half-complete" theory of iterative 
decoding. Moreover this theory has poor connections with 
classical issues such as the behavior of the same codes under 
maximum likelihood (ML) decoding, (ii) Loopy belief prop- 
agation has stimulated a considerable interest as a general- 
purpose inference algorithm for graphical models. However, 
there are very few applications where its effectiveness can be 
analyzed mathematically. Decoding below threshold is prob- 
ably the most prominent of such examples and one may hope 
to build upon this success. (Hi) There are communication 
contexts in which one is is interested in reproducing some in- 
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formation within a pre-established tolerance, rather than ex- 
actly. There are indications that iterative methods can play 
an important role also in such contexts. If this is the case, one 
will necessarily operate in the above-threshold regime. 

Consider, for the sake of simplicity, communication over 
a memoryless channel using random elements from a stan- 
dard low-density parity-check (LDPC) code ensemble. As- 
sume moreover that the noise level is greater than the thresh- 
old one. There are two natural theoretical problems one can 
address in this regime: (A) How many channel inputs cor- 
respond to a given typical output? (B) How hard is to re- 
construct all of them? Answering question (A) amounts to 
computing the conditional entropy H(X"\Y") of the channel 
input given the output (here n is the blocklength). We ex- 
pect this entropy to become of order O(n) at large enough 
noise. We call the minimum noise level for this to be the case, 
the ML threshold. ML decoding is bound to fail above this 
threshold. 

The second question is apparently far from Information 
Theory and in any case very difficult to answer. The naive 
expectation would be that reconstructing all the typical code- 
words becomes harder as their conditional entropy gets larger. 

In this paper we report some recent progress on both of the 
questions outlined above. In Sees. llTland lllll we reconsider the 
binary erasure channel (BEC). We define a natural extension 
of the belief propagation decoder which reconstruct all the 
codewords compatible with a given channel output. The new 
algorithm ('Maxwell decoder') thus performs a 'complete' list 
decoding, and is based on the general message-passing phi- 
losophy. Below the iterative threshold, it coincides with belief 
propagation decoding and its complexity is linear in the block- 
length. Above the iterative threshold, its complexity becomes 
exponential. Its behavior can be analyzed precisely, and pro- 
vides answers both questions (A) and (B) above (within this 
circumscribed context). Surprisingly, the resulting picture is 
most easily conveyed using a well-known information theoretic 
characterization of the code: the EXIT curve. As a byprod- 
uct, we obtain an alternative proof of the area theorem for the 
BEC. 

The connection between the EXIT curve and Maxwell de- 
coder is not a peculiarity of the binary erasure channel, and 
has instead a rather fundamental origin. The algorithm pro- 
gressively reduces the uncertainty on the transmitted bits. 
This can be regarded as an effective change of the noise level 
of the communication channel. The EXIT curve describe the 
response of the bits (i.e., the change of the bit uncertainty) to 
a change in the noise level. The area theorem is obtained when 
integrating this response: the total bit uncertainty at maxi- 
mal noise level (the code rate) is thus given by an integral of 
the EXIT curve. 

In Sec. II VI we explain how to generalize these ideas to ar- 



bitrary memoryless channels. In particular, we define a gen- 
eralized EXIT function GEXIT, which has the same important 
properties of the usual one. We show that an area theorem 
holds for such a function, implying, among other things, an 
upper bound on the ML threshold. GEXIT reduces to EXIT 
for the BEC and to the minimal mean-square error (MMSE) 
for additive Gaussian channels. 



II. Area Theorem for the Binary Erasure 
Channel 

Consider a degree distribution pair (A, p) and ensembles 
LDPC(n, A, p) of increasing length n. Figure^shows a typical 
asymptotic EXIT 2 function. Its main characteristics (for a 
regular ensemble with left degree at least 3) are as follows: 
The function is zero below the ML threshold eml- It jumps at 
eml to a non-zero value and continues then smoothly until it 
reaches one for e = 1. The area under the EXIT curve equals 
the rate of the code, see pp. Compare this to the equivalent 
function of the iterative (IT) decoder which is also shown in 
Fig-0 It is easy to check that this curve is given in parametric 
form by 

( Ad-p'l-x)) '^ 1 -^-'))). W 

where x signifies the erasure probability of left-to-right mes- 
sages. Equation Q can be derived from the fixed-point equa- 
tion eA(l— p(l— x))— x — 0. We express e as e(x) = x(i-p(i-x)) 
and notice that the average probability that a bit is still erased 
(ignoring the observation of the bit itself) at the fixed point 
is equal to A(l — p(l — x)). Note that the iterative curve is 
the trace of this parametric equation for x starting at x = 1 
until x = xit- This is the critical point and e(xrr) = err. 
Summarizing, the iterative EXIT curve is zero up to the itera- 
tive threshold err. It then jumps to a non-zero value and also 
continues smoothly until it reaches one at e = 1. Multiple 
jumps are possible in some irregular ensembles, but we shall 
neglect this possibility here. 

The following two curious relationships between these two 
curves were shown in £Q. First, the IT and the ML curve 
coincide above 6ml- Second, the ML curve can be constructed 
from the iterative curve in the following way. If we draw the IT 
curve as parametrized in Eq. |0 not only for x > xit but also 
for < x < xit we get the curve shown in the right picture 
of Fig. Q Notice that the branch < x < xit describes an 
unstable fixed point under iterative decoding. Moreover the 
fraction of erased messages x decreases along this branch when 
the erasure probability is increased. Finally it satisfies x > e. 
Because of these peculiar features, it is usually considered as 
"spurious" . 

To determine the ML threshold take a straight vertical line 
at e = £it and shift it to the right until the area which lies to 
the left of this straight line and is enclosed by the line and the 
iterative curve is equal to the area which lies to the right of the 
line and is enclosed by the line and the iterative curve (these 
areas are indicated in dark gray in the picture). This unique 
point determines the ML threshold. The ML EXIT curve is 

2 The EXIT function is the function — X]"=i ^(-^»Mn]\{i})j see 





now the curve which is zero to the left of the threshold and 
equals the iterative curve to the right of this threshold. In 
other words, the ML threshold is determined by a balance 
between two areas 3 . 
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Fi fflire ll Left: The EXIT curve of the ML decoder for the degree 
distribution pair (X(x) = x 2 ,p(x) = x 5 ). The curve is zero until 
e ML at which point it jumps. It then continuous smoothly until 
it reaches one at e = 1. Also shown is the equivalent curve under 
iterative decoding. Right: The full iterative EXIT curve including 
the "spurious branch" . This corresponds to an unstable fixed point 
x > e. The ML threshold is determined by the balance of the two 
dark gray areas. 



III. Maxwell Decoder 

The balance condition described above, cf. Fig. Q is 
strongly reminiscent of the so-called 'Maxwell construction' 
in statistical mechanics [S]. This allows, for instance, to de- 
termine the location of a liquid-gas phase transition, by bal- 
ancing two areas in the pressure- volume phase diagram. The 
Maxwell construction is derived by considering a reversible 
transformation between the liquid and vapor phases. The 
balance condition follows from the observation that the net 
work exchange along such a transformation must vanish at 
the phase transition point. 

Inspired by the statistical mechanics analogy, we shall ex- 
plain the balance condition determining the ML threshold 
by analyzing an algorithm which moves from the non zero- 
entropy branch to the zero-entropy branch of the EXIT curve. 
To this end we construct a fictitious decoder, which for obvious 
reasons we name the Maxwell decoder. Instead of explaining 
the balance between the areas as shown in Fig. we will ex- 
plain the balance of the two areas shown in Fig. Note that 
these two areas differ from the previous ones only by a com- 
mon term so that the condition for balance stays unchanged. 

Let us now introduce the decoder: Given the received word 
which was transmitted over the BEC(e), the decoder proceeds 
iteratively as does the standard message passing decoder. At 
any time the iterative decoding process gets stuck in a non- 
empty stopping set the decoder randomly chooses a position 
i £ [n]. If this position is not known yet the decoder splits any 
running copy of the decoding process into two, one which pro- 
ceed with the decoding process by assuming that Xi = and 
one which proceeds by assuming that Xi = 1. This splitting 
procedure is repeated any time the decoder gets stuck and 

3 The ML threshold was first determine by the replica method 
in Further, in a simple counting argument leading to an 
upper bound for this threshold was given. In this paper we take as 
a starting point the point of view taken in . 
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Figure 2: EXIT curve for the (a; 2 , a; 5 ) ensemble. Dark gray: the 
two areas whose balance is proved by the analysis of the Maxwell 
decoder. Left: Total number of guesses made by a decoder start- 
ing at 6ml (divided by the blocklength). Right: Total number of 
contradictions encountered (divided by the blocklength). 



we say that the decoder guesses a bit. During the decoding 
it can happen that contradictions occur, i.e., that a variable 
node receives inconsistent messages. Any copy of the decoding 
process which contains such contradictions terminates. From 
the above description it follows that at any given point of 
the decoding process there are 2 h ^ copies alive, where h(£) 
is a natural number which evolves with time I. Eventually, 
each surviving copies will has determined all the erased bits, 
and outputs the corresponding word of size n. It is hopefully 
clear from the above description that the final list of surviving 
copies is in one-to-one correspondence with the list of code- 
words that are compatible with the received message. In other 
words, the Maxwell decoder performs a complete list decoding 
of received message. 

In Fig. |3 we depict an instance of the decoding process 
is shown from the perspective of the various simultaneous 
copies. The initial phase coincides with standard message 
passing: a single copy of the process decodes a bit at a time. 
After three steps, belief propagation gets stuck in stopping 
set and several steps of guessing follow. During this phase 
h(£) (the associated entropy, i.e., the log 2 of the number of 
simultaneously running copies) increases. After this guessing 
phase, the standard message passing phase resumes. More 
and more copies will terminate due to inconsistent messages 
(incorrect guesses). At the end, only one copy survives, which 
shows that the example has a unique ML solution. 



and erasure probability e = 0.47). It can be shown that the 
rescaled entropy h(£)/n concentrates around a finite limiting 
value if we take the large blocklength limit n — > oo, with £/n 
fixed. Moreover the limiting curve can be computed exactly. 
Here we limit ourselves to outline the connection with the 
various areas highlighted in Fig. [5] and to explain why these 
areas should be in balance at the ML threshold. To simplify 
matters consider only channel parameters e with e > eiT. We 
claim that the total number of guesses one has to venture 
during the guessing phase of the algorithm is equal to the 
dark gray area shown in the left picture of Fig. |21 i.e., it is 
equal to the integral under the iterative curve from eiT up to 
e. 

The effect of the guesses is to bring the effective erasure 
probability down from e to eiT- At this point the standard 
message passing decoder can resume. The guesses are now 
resolved in the following manner. Assume that at some point 
in time there is a variable node which has d connected check 
nodes of degree one. The corresponding incoming messages 
have to be consistent. This gives rise to d — 1 constraints, 
or in other words, only a fraction 2 d_1 of the running copies 
survive. It can now be shown that the total number of such 
constraints which are imposed is equal to the area in the left 
picture of Fig. [5] At the ML threshold all guesses have to be 
resolved at the end of the decoding process. This implies that 
the total number of required guesses has to equal the total 
number of resolved guesses which implies an equality of the 
areas as promised! 
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Figure 3: Maxwell decoder applied to a simple example when the 
all-zero codeword is decoded. 



In Fig. |2 we plot the entropy h(£) as a function of the 
number of iterations for several code and channel realizations 
(here we consider a (3, 6) ensemble with blocklength n = 10 4 



Figure 4: Entropy of the Maxwell decoder (logarithm of the num- 
ber of running copies) as a function of the number of determined 
bits. We plot the results for several channel and code realizations 
(here for a (3, 6) ensemble with blocklength n = 10 4 and e = 0.47) 
together with the analytical asymptotic curve. In the inset: how 
the asymptotic curve can be constructed form the EXIT function. 



Notice that the Maxwell decoder plays the same role as a 
reversible transformation in thermodynamics. 

IV. General Channel 

Three important lessons can be learned from the BEC ex- 
ample treated in the previous Sections. First of all: the EXIT 
curve gives the change in conditional entropy of the trans- 
mitted message when the channel noise level is incremented 
by an infinitesimal amount. Second: in a search algorithm 
reconstructing all the typical input codewords, this change 



has to be compensated for by an increase of the algorithm 
entropy. This is the fundamental reason of the equality be- 
tween the area under the stable branch of the EXIT curve 
and the number of guesses made by the Maxwell decoder. 
Third: the fact that the iterative EXIT curve extends below 
the maximum-likelihood one implies that the corresponding 
additional guesses must be eventually resolved. The unstable 
branch of the EXIT curves yield the number of contradictions 
found in this resolution stage. 

The first step towards a generalization of this scenario for 
an arbitrary memoryless channel consist in finding the appro- 
priate generalization of the EXIT curve. We obtain such a gen- 
eralization by enforcing the first of the above properties. For 
the sake of definiteness, we assume both the input and output 
alphabets to be finite and denote by Q(y\x), x G X , y G y the 
transition probability. Formulae for continuous alphabets are 
easily obtained by substituting integrals Jdx, Jdy, to sums 
~}2 X , ~}2 y - We moreover denote by w a generic noise- level pa- 
rameter and assume Q(y\x) to be differentiable with respect 
to w. In analytical calculations, it is convenient to distinguish 
the noise levels for each channel use Wi, i G [n]. The time- 
invariant channel is recovered by setting wi = • • • = w n — w 
Finally, we denote by X_ = X" the channel input and Y_ = Y" 
the channel output. Our definition of a generalized EXIT curve 
is 

1 d 



GEXIT : 



n dw 



H(X\Y) . 



(2) 



Notice that GEXIT satisfies the area theorem by construction: 
our purpose is to get a manageable expression for it. It is 
convenient to think of the above differentiation as acting on 
each channel separately 



GEXIT = - V 4— H (X\ Y) = -Y GEXIT, , 



(3) 



with GEXIT; defined as the derivative with respect to Wi. 
In order to compute GEXIT;, it is convenient to isolate the 
contribution of Xi to the conditional entropy. If we denote 
by Zi the extrinsic information at i, and use the shorthands 
XM = (Xj : j G [n]\i), yW = {Yj : j G [n]\i), we get 

H(X\Y) = H(X i \Zi,Y i ) + H(X li] \X i ,Y w ). (4) 

This is obtained by a standard application of the entropy chain 
rule 

H (X\Y) = H (Xi \Y) + H (X ll] \Xi,Y) 

= H{X t \Y,Z,)+H (X w | x, , y w ) 

= H(Xi\Yi,Y K , Zi) + H(X} i] \Xi,Y [ii ) 

= ff(Xi|y i ,Z t )+tf(£ [iI |X i ,Y w ). 

We remark at this point that only the first term of the decom- 
position Q depends upon the channel at position i. Therefore 



GEXIT, = ^H(Xi\Zi,Y) 
dwi 



(5) 



It is convenient to obtain a more explicit expression for the 
above formula. To this end we write 



H(Xi\Zi,Yi) 



P{zi)P(xi\zi)Q( V i\xi) 



(6) 



log 



P{x l \z l )Q(y l \x i ) 



The dependence of H(Xi\Zi, Yt) upon the channel at position 
i is completely explicit and we can differentiate. The terms 
obtained by differentiating with respect to the channel inside 
the log vanish. For instance, when differentiating with respect 
to the Q(yi\xi) at the numerator, we get 

d 



P(zi)P(xi\zi)-^-Q(yi\xi) 



We thus proved the following 
Theorem 1 With the above definitions 

GEXIT, = Yl P{xi)P{zi\xi)Q'{ yi \xi)- (7) 



P(x' t \z t )Q(y t \x' t ) 
•f P(xi\zi)Q(yi\xi) 



where we denoted by Q'(y\x) the derivative of the channel tran- 
sition probability with respect to the noise level w. 

The interest of the above result is that it encapsulates all 
our ignorance about the code behavior into the distribution 
of extrinsic information P(zi). This is in turn the natural 
object appearing in message passing algorithms and in density 
evolution analysis. In order to fully appreciate the meaning of 
Eq. £J, it is convenient to consider a couple of more specific 
examples. 

Linear Codes over BMS Channels 

We assume the code to be linear and to be used over a 
binary-input memoryless output-symmetric (BMS) channel. 
We furthermore denote the channel input by X — {0, 1}. This 
is the most common setting in the analysis of iterative coding 
systems. Exploiting the channel symmetry we can fix Xi = 
in Eq. J7J and get 



GEXIT, = £ P (zi)Q'(yi\xi) log j 1 + 



P(l\zi)Q( yi \l) 
P(0\zi)Q( yi \0) 



(8) 



where we defined Po(zi) to be the distribution of the extrinsic 
information at i under the condition that the all-zero code- 
word has been transmitted. Recall that Zi is a function of 
y' 1 ' an d Po(zi) is the distribution induced on Zi by the dis- 
tribution of yw. 

It is convenient to encode the extrinsic information z% as an 
extrinsic log-likelihood ratio /; = \og[P(0\zi) / P(l\zi)]. Anal- 
ogously, we define L Q (y) = \og[Q(y\0)/Q(y\l)]. Finally, we 
denote by a'^(Z) the density of U with respect to the Lebesgue 
measure. We thus get the following handy expression 

Corollary 1 For a linear code over a BMS channel 

/ + oo 
a (I) fc^ (I) dl , (9) 
- oo 

where we introduced the GEXIT kernel 

k L {l) = £ Q'(y\0) log(l + e-^W"') . (10) 



It is worth recalling that the usual EXIT curve has a similar 
expression. In fact 

/ + oo 
a (i) (l)k L (l)dl, (11) 
-oo 

with the channel-independent EXIT kernel fct(Z) = log(l + 
e~ l ). Finally, we notice that it is possible to use alterna- 
tive encodings for the extrinsic information. One impor- 
tant possibility is to work in the so-called 'difference do- 
main' z = tanh(7/2). The new kernel will be given by 
kr>{z) = fct(2tanh _1 (z)). It is moreover possible to exploit 
the symmetry property of a^(Z) to get 

/•-t-oo 

GEXIT; = / a (l) (Z) k w (l) dl, (12) 
Jo 

where feiz,| CO — fei(Z) + e _! fc_t(/). Analogously, one can con- 
sider an 'absolute difference' kernel kmdz). 

Let us work out a couple of examples. In order to compare 
the different cases, it is useful to define a unified convention for 
the noise level parameter w. We choose w to be the channel 
entropy, or, in other words, one minus the channel capacity: 
W = 1- C(Q) (in bits). 

For the BEC we have y — {0, 1, *} and the transition prob- 
abilities read Q(0|0) = 1 - e, Q(*j0) = e, Q(i|0) = 0. Obvi- 
ously w = e and Q'(0|0) = -1, Q'(*\0) = 1, Q'(l|0) = 0. We 
get 

fc! EC (0=log(l + e-'). (13) 

Therefore kf EC (l) = k L (l) and GEXIT; = EXIT;. We thus 
recovered a well known result: the EXIT curve verifies the 
area theorem for the BEC. 
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Fi gure 6: Absolute difference domain kernels fef^i and for 
GEXIT and EXIT curves on the BSC. They should be multiplied 
by the extrinsic information distribution in the .D-domain and in- 
tegrated over [0, 1). Left: flip probability p = 0.1. Right: p = 0.3 
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Figure 5: Difference-domain kernels fc£ sc and fc^ sc for GEXIT 
and EXIT curves on the BSC. They should be multiplied by the 
extrinsic information distribution in the D-domain and integrated 
over (—1, 1). Left: flip probability p = 0.1. Right: p = 0.3 

Consider now the BSC with flip probability p. Proceeding 
as above, we get 

kf SC (l) = — ^ [log(l + ^ e- 1 ) - log(l + -P - e- 1 ) , 

log(^) L P 1-P J 

(14) 

In Figs. and |S| we plot the GEXIT kernel in the difference 
and absolute difference domains, comparing it with the usual 
EXIT one. In Fig.|7|we plot the EXIT and GEXIT curves for a 
few regular LDPC ensembles over the BSC. 

From these examples it should be clear that computing 
the GEXIT curve is not harder than computing the EXIT one. 
The difference between these two curves is often quantitatively 



Figure 7: GEXIT (solid curves) versus EXIT (dashed curves) for 
several regular LDPC ensembles over the BSC. 

small (cf. for instance Fig.0. Nevertheless such a difference is 
definitely different from zero and it is not hard to show that an 
area theorem cannot hold for the EXIT curve. Finally, several 
qualitative properties remain unchanged. In particular 

Lemma 1 Given a density a(l) over the reals, let 

/oo 
a(Z) kf sc (l) dl. (15) 
- oo 

// the density b(l) is physically degraded with respect to a(l), 
then GEXIT B sc(b) > GEXIT BS c(a). 

An important application of the above Lemma consists in 
approximating the correct extrinsic LLR densities with the 
site-averaged belief propagation density. This yields an upper 
bound on the GEXIT curve: 

1 - 

GEXIT = -VGEXIT l 

n ^ 

i=l 



= GEXITbsc 




< GEXIT BSC (a BP '' : ). 



where we denoted by a ' the belief propagation density af- 
ter k iterations. We can now take the n — > oo limit and 
(afterwards) the k — > oo limit to get 

GEXIT < GEXITbsc (a DE '* ) 

where a DB '* is the density at the density evolution fixed point. 
We obtain therefore the following 

Corollary 2 Consider communication over the BSC using 
LDPC(n, A, p) ensembles of rate R, and let pml, de be defined 
by 



R= GEXITbsc (a 

'pml, DE 



(16) 



with a DE '* the density at the density evolution fixed point at 
flip probability p. Let moreover pml be the maximum likeli- 
hood threshold defined as the smallest noise level such that the 
ensemble-averaged conditional entropy EH (2L\Y_) * s linear in 
the blocklength. Then 



Pml < Pml, de ■ 



(17) 



Example 1 For the (3, 6) ensemble and the BSC, the previ- 
ous method gives pml < 0.101. 



Gaussian Channels 

We assume X = y = R and 



Q(v\x) = —7= exp 



-^{y-Vsnrxf 



(18) 



Notice that, in this case, Q(y\x) should be interpreted as a 
density with respect to Lebesgue measure. An alternative 
formulation of the same channel model consists in saying that 
Y = v / snrX + W with W a standard Gaussian variable. It is 
also useful to define the minimal mean square error MMSEi 
in estimating Xi as follows 



MMSEi = E Y ,z, {E Xi [xi\Vi,Zi] - E Xi [xi\Vi, Zif} 



(19) 



where Ex,y,... denotes expectation with respect to the vari- 
ables {X, Y, . . .}. Finally, we take the signal-to-noise ratio as 
the noise parameter entering in the definition of the GEXIT 
curve: w = snr. The reader will easily translate the results to 
other choices of w by a change of variable. 

As recently shown by Guo, Shamai and Verdu |5J, the 
derivative with respect to the signal-to-noise ratio of the mu- 
tual information across a gaussian channel is related to the 
minimal mean-square error. Adapting their result to the 
present context, we immediately obtain the following 

Corollary 3 For the additive Gaussian channel defined 
above, we have 



GEXIT; = - - MMSEj . 



(20) 



For greater convenience of the reader we briefly recall the 
derivation [5] of this result from the expression Q. In or- 
der to keep things simple, we shall consider here the case of a 
simple symbol with input density P(x) transmitted uncoded 
through the channel. The generalization is immediate. We 
rewrite Eq. J7J in the single symbol case as 



GEXIT = 



// 



P{x)Q\y\x) log 



P(x')Q(y\x') 
P{x)Q{y\x) 



dx' 



dx dy . 



It is convenient to group at this point a couple of remarks 
which simplify the calculations. First 



}'{y\x) 



d 



Q(y\x) 



Second 



1 d 



E x [x\y] = E x [x 2 \y] - E x [x\yY 



(22) 



(23) 



Both of these formulae are obtained through simple calculus. 
In order to prove Eq. 1201 we use 1221 in Eq. I2H and integrate 
by parts with respect to y. After re-ordering the various terms, 
we get 



GEXIT = 



1 



Ex [x\y] P(x) —Q(y\x) dx dy . 
dy 



(24) 



(21) 



At this point we integrate by parts once more with respect to 
y and use 1231 to get the desired result. 

Notice that the strikingly simple relation I2U1 was re- 
cently used in an iterative coding setting by Bhattad and 
Narayanan |1U| . 
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